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The Communicative Multiagent Team Decision Problem: 
Analyzing Teamwork Theories and Models 



Despite the significant progress in multiagent teamwork, existing research does not ad- 
dress the optimality of its prescriptions nor the complexity of the teamwork problem. With- 
out a characterization of the optimality-complcxity tradeoffs, it is impossible to determine 
whether the assumptions and approximations made by a particular theory gain enough 
efficiency to justify the losses in overall performance. To provide a tool for use by mul- 
tiagent researchers in evaluating this tradeoff, we present a unified framework, the COM- 
municative Multiagent Team Decision Problem (COM-MTDP). The COM-MTDP model 
combines and extends existing multiagent theories, such as decentralized partially observ- 
able Markov decision processes and economic team theory. In addition to their generality 
of representation, COM-MTDPs also support the analysis of both the optimality of team 
performance and the computational complexity of the agents' decision problem. In analyz- 
ing complexity, we present a breakdown of the computational complexity of constructing 
optimal teams under various classes of problem domains, along the dimensions of observ- 
ability and communication cost. In analyzing optimality, we exploit the COM-MTDP's 
ability to encode existing teamwork theories and models to encode two instantiations of 
joint intentions theory taken from the literature. Furthermore, the COM-MTDP model 
provides a basis for the development of novel team coordination algorithms. We derive a 
domain-independent criterion for optimal communication and provide a comparative anal- 
ysis of the two joint intentions instantiations with respect to this optimal policy. We have 
implemented a reusable, domain-independent software package based on COM-MTDPs to 
analyze teamwork coordination strategies, and we demonstrate its use by encoding and 
evaluating the two joint intentions strategies within an example domain. 

1. Introduction 

A central challenge in the control and coordination of distributed agents is enabling them 
to work together, as a team, toward a common goal. Such teamwork is critical in a vast 
range of domains — for future teams of orbiting spacecraft, sensors for tracking targets, un- 
manned vehicles for urban battlefields, software agents for assisting organizations in rapid 
crisis response, etc. Research in teamwork theory has built the foundations for successful 
practical agent team implementations in such domains. On the forefront are theories based 
on belief-desire-intentions (BDI) frameworks, such as joint intentions (Cohen & Levesque, 
1991b, 1991a; Levesque, Cohen, & Nunes, 1990), SharedPlans (Grosz, 1996; Grosz &; Kraus, 
1996; Grosz & Sidner, 1990), and others (Sonenberg, Tidhar, Werner, Kinny, Ljungberg, 
&; Rao, 1994; Dunin-Keplicz &; Verbrugge, 1996), that have provided prescriptions for co- 
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ordination in practical systems. These theories have inspired the construction of practi- 
cal, domain-independent teamwork models and architectures (Jennings, 1995; Pynadath, 
Tambe, Chauvat, & Cavedon, 1999; Rich & Sidner, 1997; Tambe, 1997; Yen, Yin, loerger. 
Miller, Xu, & Volz, 2001), successfully applied in a range of complex domains. 

Yet, two key shortcomings limit the scalability of these BDI-based theories and imple- 
mentations. First, there are no techniques for the quantitative evaluation of the degree of 
optimality of their coordination behavior. While optimal teamwork may be impractical in 
real-world domains, such analysis would aid us in comparison of different theories/models 
and in identifying feasible improvements. One key reason for the difficulty in quantitative 
evaluation of most existing teamwork theories is that they ignore the various uncertain- 
ties and costs in real-world environments. For instance, joint intentions theory (Cohen & 
Levesque, 1991b) prescribes that team members attain mutual beliefs in key circumstances, 
but it ignores the cost of attaining mutual belief (e.g., via communication). Implementa- 
tions that blindly follow such prescriptions could engage in highly suboptimal coordination. 
On the other hand, practical systems have addressed costs and uncertainties of real-world 
environments. For instance, STEAM (Tambe, 1997; Tambe & Zhang, 1998) extends joint 
intentions with decision-theoretic communication selectivity. Unfortunately, the very prag- 
matism of such approaches often necessarily leads to a lack of theoretical rigor, so it remains 
unanswered whether STEAM's selectivity is the best an agent can do, or whether it is even 
necessary at all. The second key shortcoming of existing teamwork research is the lack 
of a characterization of the computational complexity of various aspects of teamwork deci- 
sions. Understanding the computational advantages of a practical coordination prescription 
could potentially justify the use of that prescription as an approximation to optimality in 
particular domains. 

To address these shortcomings, we propose a new complementary framework, the COM- 
municative Multiagent Team Decision Problem (COM-MTDP), inspired by work in eco- 
nomic team theory (Marschak &; Radner, 1971; Yoshikawa, 1978; Ho, 1980). While our 
COM-MTDP model borrows from a theory developed in another field, we make several 
contributions in applying and extending the original theory, most notably adding explicit 
models of communication and system dynamics. With these extensions, the COM-MTDP 
generalizes other recently developed multiagent decision frameworks, such as decentralized 
POMDPs (Bernstein, Zilberstein, & Immerman, 2000). 

Our definition of a team (like that in economic team theory) assumes only that team 
members have a common goal and that they work selflessly towards that goal (i.e., they 
have no other private goals of their own). In terms of our decision-theoretic framework, we 
assume that all of the team members share the same joint utility function — that is, each 
team member's individual preferences are exactly the preferences of the other members and, 
thus, of the team as a whole. Our definition may appear to be a "bare-bones" definition of 
a team, since it does not include common concepts and assumptions from the literature on 
what constitutes a team (e.g., the teammates form a joint commitment (Cohen & Levesque, 
1991b), attain mutual belief upon termination of a joint goal, intend that teammates suc- 
ceed in their tasks (Grosz & Kraus, 1996), etc.). From our COM-MTDP perspective, we 
view these concepts as more intermediate concepts, as the means by which agents improve 
their team's overall performance, rather than ends in themselves. Our hypothesis in this 
investigation is that our COM-MTDP-based analysis can provide concrete justifications for 
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these concepts. For example, while mutual belief has no inherent value, our COM-MTDP 
model can quantify the improved performance that we would expect from a team that 
attains mutual belief about important aspects of its execution. 

More generally, this paper demonstrates three new types of teamwork analyses made 
possible by the COM-MTDP model. First, we analyze the computational complexity of 
teamwork within subclasses of problem domains. For instance, some researchers have ad- 
vocated teamwork without communication (Goldberg &; Mataric, 1997). We use the COM- 
MTDP model to show that, in general, the problem of constructing optimal teams without 
communication is NEXP-complete, but allowing free communication reduces the problem 
to be PSPACE-complete. This paper presents a breakdown of the complexity of optimal 
teamwork over problem domains classified along the dimensions of observability and com- 
munication cost. 

Second, the COM-MTDP model provides a powerful tool for comparing the optimality 
of different coordination prescriptions across classes of domains. Indeed, we illustrate that 
we can encode existing team coordination strategies within a COM-MTDP for evaluation. 
For our analysis, we selected two joint intentions-based approaches from the literature: one 
using the approach realized within GRATE* and the joint responsibility model (Jennings, 
1995), and another based on STEAM (Tambe, 1997). Through this encoding, we derive the 
conditions under which these team coordination strategies generate optimal team behavior, 
and the complexity of the decision problems addressed by them. Furthermore, we also 
derive a novel team coordination algorithm that outperforms these existing strategies in 
optimality, though not in efficiency. The end result is a well-grounded characterization of 
the complexity -optimality tradeoff among various means of team coordination. 

Third, we can use the COM-MTDP model to empirically analyze a specific domain of 
interest. We have implemented reusable, domain-independent algorithms that allow one to 
evaluate the optimality of the behavior generated by different prescriptive policies within a 
problem domain represented as a COM-MTDP. We apply these algorithms in an example 
domain to empirically evaluate the aforementioned team coordination strategies, charac- 
terizing the optimality of each strategy as a function of the properties of the underlying 
domain. For instance, Jennings reports experimental results (Jennings, 1995) indicating 
that his joint responsibility teamwork model leads to lower waste of community effort than 
competing methods of accomplishing teamwork. With our COM-MTDP model, we were 
able to demonstrate the benefits of Jennings' approach under many configurations of our ex- 
ample domain. However, in precisely characterizing the types of domains that showed such 
benefits, we also identified domains where these competing methods may actually perform 
better. In addition, we can use our COM-MTDP model to re-create and explain previous 
work that noted an instance of suboptimality in a STEAM-based, real-world implementa- 
tion (Tambe, 1997). While this previous work treated that suboptimality as anomalous, our 
COM-MTDP re-evaluation of the domain demonstrated that the observed suboptimality 
was a symptom of STEAM'S general propensity towards extraneous communication in a 
significant range of domain types. Both the algorithms and the example domain model are 
available for public use in an Online Appendix 1. 

Section 2 presents the COM-MTDP model's representation and places it in the context 
of related multiagent models from the literature. Section 3 uses the COM-MTDP model to 
define and characterize the complexity of designing optimal agent teams. Section 4 analyzes 
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the optimality of existing team coordination algorithms and derives a novel coordination 
algorithm. Section 5 presents empirical results from applying our COM-MTDP algorithms 
to an example domain. Section 6 summarizes our results, and Section 7 identifies some 
promising future directions. 

2. The COM-MTDP Model 

This section defines and describes the COM-MTDP model itself and its ability to represent 
the important aspects of multiagent teamwork. We begin in Section 2.1 by defining the 
underlying multiagent team decision problem with no explicit communication. Section 2.2 
defines the complete COM-MTDP model with its extension to explicitly represent commu- 
nication. Section 2.3 provides an illustration of how the COM-MTDP model represents the 
execution of a team of agents. Finally, Section 2.4 describes related models of multiagent 
coordination and shows how the COM-MTDP model generalizes them. 

2.1 Multiagent Team Decision Problems 

Given a team of selfless agents, a, who intend to perform some joint task, we wish to evaluate 
possible policies of behavior. We represent a multiagent team decision problem (MTDP) 
model as a tuple, {S, Aa, P, fia, Oq, Ba,R). We have taken the underlying components of 
this model from the initial team decision model (Ho, 1980), but we have extended them to 
handle dynamic decisions over time and to more easily represent multiagent domains (in 
particular, agent beliefs). We assume that the model is common knowledge to all of the 
team members. In other words, all of the agents believe the same model, and they believe 
that they all believe the same model, etc. 

2.1.1 World States: S 

• 5 = Si X ■ ■ ■ X S^: a set of world states, expressed as a factored representation (a 
cross product of separate features). 

The state of the world here is the state of the team's environment (e.g., terrain, location of 
enemy). Thus, each represents the domain of an individual feature of that environment, 
while S represents the domain of all possible combinations of values over the individual 
features. 

2.1.2 Domain-Level Actions: Aa 

{Ai\i^a is a set of actions for each agent to perform to change its environment, implicitly 
defining a set of combined actions, Aa = Iljea -^i (corresponding to team theory's decision 
variables). 

Extension to Dynamic Problem: P The original team decision problem focused on 
a one-shot, static problem. We extend the original concept so that each component is a 
time series of random variables. The effects of domain-level actions (e.g., a flying action 
changes a helicopter's position) obey a probabilistic distribution, given by a function P : 
S X Aa X 5 — >^ [0, 1]. In other words, for each initial state s at time t, combined action a 
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taken at time i, and final state s' at time t + I, Pr(5*"'"^ = s'\S^ = s, = a) = P{s, a, s'). 
The given definition of P assumes that the world dynamics obey the Markov assumption. 

2.1.3 Agent Observations: fta 

is a set of observations that each agent, «, can experience of its world, implicitly 
defining a combined observation, Hq, = Hiea ^i- include elements corresponding 

to indirect evidence of the state (e.g., sensor readings) and actions of other agents (e.g., 
movement of other helicopters). In the original team-theoretic framework, the information 
structure that represented the observation process of the agents was a set of deterministic 
functions, Oi : S ^ Ui. 

Extension of Allowable Information Structures: Oa We extend the information 
structure representation to allow for uncertain observations. We use a general stochastic 
model, borrowed from the partially observable Markov decision process model (Smallwood & 
Sondik, 1973), with a joint observation function: Oa{s,a,u}) = Pr(fJ^ = u}\S^ = s,A*^^ = 
a). This function models the sensors, representing any errors, noise, etc. In some cases, we 
can separate this joint distribution into individual observation functions: Oa = IliGa'^i: 
where Oi{s,a,Lj) = Pr(fi* = coils'* = s,A'^~^ = a). Thus, the probability distribution 
specified by Oa forms the richer information structure used in our model. We can make 
useful distinctions between different classes of information structures: 

Collective Partial Observability This is the general case, where we make no assump- 
tions on the observations. 

Collective Observability There is a unique world state for the combined observations of 
the team: G fla, 3s G 5 such that Vs' / s, Pr(n^ = u\S^ = s') = 0. The set 
of domains that are collectively observable is a strict subset of the domains that are 
collectively partially observable. 

Individual Observability There is a unique world state for each individual agent's ob- 
servations: Vw G 3s e S such that Vs' 7^ s, Pr(J^* = uj\S^ = s') = 0. The set 
of domains that are individually observable is a strict subset of the domains that are 
collectively observable. 

Non- Observability The agents receive no feedback from the world: 3a; G Oj, such that 
Vs G 5 and Va G Aa, Pr(r2* = lo\S^ = s,A'^^^ = a) = 1. This assumption holds 
in open- loop systems, which come under frequent consideration in classical plan- 
ning (Boutilier, Dean, Sz Hanks, 1999). 

2.1.4 Policy (Strategy) Space 

TTiA is a domain-level policy (or strategy, in the original team theory specification) to map 
an agent's belief state to an action. In the original formalism, the agent's beliefs correspond 
directly to its observations (i.e., ttia ■ Ai). 

Extension to Richer Belief State Space: Ba We generalize the set of possible strate- 
gies to capture the more complex mental states of the agents. Each agent, i E a, forms a 
belief state, bj E Bi, based on its observations seen through time t, where Bi circumscribes 
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the set of possible belief states for the agent. Thus, we define the set of possible domain- 
level policies as mappings from belief states to actions, ttj^ : Bi Ai. We define the set 
of possible combined belief states over all agents to be = Iljea ^i- "^^^ corresponding 
random variable, 6^, represents the agents' combined belief state at time t. We elaborate 
on different types of belief states and the mapping of observations to belief states (i.e., the 
state estimator function) in Section 2.2.1. 

2.1.5 Reward Function: R 

A common reward function is central to the notion of teamwork in a MTDP: R : S x Aa 
m. This function represents the team's joint preferences over states and the cost of domain- 
level actions (e.g., destroying enemy is good, returning to home base with only 10% of 
original force is bad). We assume that, as selfless team members, each agent shares these 
preferences at the individual level as well. Therefore, each team member wants exactly 
what is best for the team as a whole. 

2.2 Extension for Explicit Communication: Sq, 

We make an explicit separation between domain- level actions (A„) and communicative 
actions. As defined in this section, communicative actions affect the receiving agents' indi- 
vidual belief states, but, unlike domain-level actions, they do not directly change the world 
state. Although this distinction is sometimes blurry in real-world domains, we make this 
explicit separation so as to isolate, as much as possible, the efi'ects of the two types of 
actions. The leverage gained from this separation provides the basis for the informative, 
analytical results presented in the rest of this paper. To capture this separation, we extend 
our initial MTDP model to be a communicative multiagent team decision problem (COM- 
MTDP), that we define as a tuple, (5*, Aq,, Sq,, P, fia, Oq,, Bq,, i?), with a new component, 
Sq,, and an extended reward function, R. 

2.2.1 Communication: Sq, 

{SjjjgQ, is a set of possible messages for each agent, implicitly defining a set of combined 
communications, = Hiea agent, i, may communicate message x E S,j to its 

teammates, who interpret the communication by updating their belief states in response. As 
a first step in this work, we assume that all of the agents receive the messages instantaneously 
and correctly (i.e., there is no lag or noise in the communication channels). This model is 
common knowledge among all of the team members, so once an agent has sent a message, 
it knows that its team members have received the message, and its team members know 
that it knows that they have all received the message, and so on. 

With communication, we divide each decision epoch into two phases: the pre-communi- 
cation and post-communication phases, denoted by the subscripts and S», respectively. 
In particular, the agents update their belief states at two distinct points within each de- 
cision epoch: once upon receiving observation (producing the pre- communication be- 
lief state &*,j]), and again upon receiving the other agents' messages (producing the post- 
communication belief state bjj.^). The distinction allows us to differentiate between the belief 
state used by the agents in selecting their communication actions and the more "up-to-date" 
belief state used in selecting their domain-level actions. We also distinguish between the 
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separate state- estimator functions used in each update phase: 

6° =SEfO (1) 
bl,j,=SE,.^(b^i,^}f) (2) 
=SEij2»{bl,j^,^a) (3) 

where SEi,j^ : Bi x Q-i — > Bi is the pre-communication state estimator for agent i, and 
SEi^, : Bi X Sq, — )■ i?j is the post-communication state estimator for agent i. The initial 
state estimator, SEf : ^ ^ Bi, specifies the agent's prior beliefs, before any observations 
are made. For each of these, we also make the obvious definitions for the corresponding 
estimators for the combined belief states: SEa,j:, SEaj:,, and SE^. 

In this paper, as a first step, we assume that the agents have perfect recall. In other 
words, the agents recall all of their observations, as well as all communication of the other 
agents. Thus, their belief states can represent their entire histories as sequences of obser- 
vations and received messages: Bi = 0* x S* , where X* denotes the set of all possible 
sequences (of any length) of elements of X. The agents realize perfect recall through the 
following state estimator functions: 

SEfO = (4) 

SE,.^i{{nl^'i),...,{nr\K-'))M) 

SE,^.{{{nl so > , . . . , s^-i) , {ni •» , s*j 

= «qo, so >,..., <o*,si» (6) 

In other words, SEf initializes agent i's belief state to be an empty history, SEi,Y, appends a 
new observation to agent i's belief state, and SEi^, appends new messages to agent i's belief 
state. Under this paper's assumptions of perfect recall, all three state-estimator functions 
take only constant time. However, we can potentially allow more complex functions (though 
the complexity results presented hold only if the state-estimator functions take polynomial 
time). For instance, although we assume perfect, synchronous, instantaneous communica- 
tion here, we could potentially use the post-communication state estimator to model any 
noise, temporal delays, asynchrony, cognitive burden, etc. present in the communication 
channel. 

We extend our definition of a policy of behavior to include a communication policy, 
T^iY, : Bi — )■ Ej, analogous to Section 2.1.4's domain-level policy. We define the joint policies, 
TTas and tTq,^, as the combined policies across all agents in a. 

2.2.2 Extended Reward Function: R 

We extend the team's reward function to also represent the cost of communicative acts (e.g., 
communication channels may have associated cost): R : Sx x Sq, — )■ H. We assume that 
the cost of communication and of domain-level actions are independent of each other, so we 
can decompose the reward function into two components: a communication- level reward, 
i?E : 'S' X Sq — > M., and a domain-level reward, Ra ■ S x Aa — > K.. The total reward is 
the sum of the two component values: R{s,a,a) = RA{s,a) + Ry,{s,(j). We assume that 
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communication has no inherent benefit and may instead have some cost, so that for all 
states, ,s 6 S, and messages, cr G Sq, the reward is never positive: i?E(,s,<T) < 0. However, 
although we assign communication no explicit value, it can have significant implicit value 
through its effect on the agents' belief states and, subsequently, on their future actions. 

As with the observability function, we parameterize the communication costs associated 
with message transmissions: 

General Communication: We make no assumptions about communication. 

Free Communication: i?j](s,cr) = for any cr G Dq,, and s G 5. In other words, 
communication actions have no effect on the agents' reward. 

No communication: Sq, = 0, i.e., no explicit communication. Alternatively, communica- 
tion may be prohibitively expensive, so that Vtr G Sq,, and s G 5, -Rs(s, cr) = —oo. 

The free-communication case appears in the literature, when researchers wish to focus 
on issues other than communication cost. Although, real-world domains rarely exhibit 
such ideal conditions, we may be able to model some domains as having approximately free 
communication to a sufficient degree. In addition, analyzing this extreme case gives us some 
understanding of the benefit of communication, even if the results do not apply across all 
domains. We also identify the no- communication case because such decision problems have 
been of interest to researchers as well (Goldberg & Mataric, 1997). Of course, even if S^, = 0. 
it is possible that there are domain-level actions in that have implicit communicative 
value by acting as signals that convey information to the other agents. However, we still 
label such agent teams as having no communication for the purposes of the work here, since 
many of our results exploit an explicit separation between domain- and communication- level 
actions. 

2.3 Model Illustration 

We can view the evolving state as a Markov chain with separate stages for domain-level 
and communication-level actions. In other words, each agent team member, i G a begins 
in some initial state, S^, with initial belief states, = SEf(). Each agent receives an 
observation 0? drawn according to the probability distribution Oa{S^, null, Ct^) (there are 
no actions yet). Then, each agent updates its belief state, = S Ei,Y:{b^ Alf) ■ 

Next, each agent « G a selects a message according to its communication policy, = 
7ris(&^,£), defining a combined communication, Each agent interprets the commu- 

nications of all of the others by updating its belief state, 6^^., = SEij2»{b^,j^,'Sa)- Each 
then selects an action according to its domain- level policy. A'- = 7ri^(6°j.,), defining a 
combined action A^. By our central assumption of teamwork, each agent receives the 
same joint reward, = i?(S'°, A° , S° ). The world then moves into a new state, S^, 
according to the distribution, P{S^ , A^). Again, each agent i receives an observation 
drawn from f2j according to the distribution Oq,(5^, A^, fi^), and it updates its belief state, 
bl^ = SEi.j:{b^,^„n}). 

The process continues, with agents choosing communication- and domain-level actions, 
observing the effects, and updating their beliefs. Thus, in addition to the time series of world 
states, 5°, S^, ... , S"*, the agents themselves determine a time series of communication- level 
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and domain- level actions, S^, S^, . . . , and A^, A^, . . . , A^, respectively. We also have 
a time series of observations for each agent i, Likewise, we can treat the 

combined observations, , f2^, . . . , f2^, as a similar time series of random variables. 

Finally, the agents receive a series of rewards, R^, R^, . . . , i?*. We can define the value, 
V, of the policies, tTq,^ and tTqs, as the expected reward received when executing those 
policies. Over a finite horizon, T, this value is equivalent to the following: 





' T 












(7) 




.t=0 







2.4 Related Work 

The COM-MTDP model subsumes many existing multiagent models, as presented in Ta- 
ble 1 (i.e., we can map any instance of these models into a corresponding COM-MTDP). 
This generality enables us to perform novel analyses of real-world teamwork domains, as 
demonstrated by Section 4's use of the COM-MTDP model for analyzing the optimality of 
communication decisions. 

2.4.1 Decentralized POMDPs 

With its model of observability and world dynamics, our COM-MTDP model closely par- 
allels the structure of the decentralized partially observable Markov decision process (DEC- 
POMDP) (Bernstein et al., 2000). Following our notational conventions, a DEC-POMDP 
is a tuple, {S, A^, P,fla, Oa,R)- There is no set of possible messages, Sq,, so the DEC- 
POMDP falls into the class of domains with no communication. The DEC-POMDP obser- 
vational model, O, is general enough to capture collectively partially observable domains. 

2.4.2 Partially Observable Identical Payoff Stochastic Games 

Stochastic games provide a rich framework for multiagent decision making when the agents 
may have their own individual goals and preferences. The identical payoff stochastic game 
(IPSG) restricts the agents to share a single payoff function, appropriate for modeling 
the single, global reward function of the team context. The partially observable IPSG 
(POIPSG) (Peshkin, Kim, Meuleau, k Kaelbling, 2000) is a tuple, {S,Aa,P,^a,Oa,R), 
very similar to the DEC-POMDP model. In other words, the observation function, Oq,, is 
general enough to support collectively partially observable domains, and there is no commu- 
nication. 

2.4.3 Multiagent MDPs 

Another relevant model is the multiagent Markov decision process (MMDP) (Boutilier, 
1996), which is a tuple, (5, A„,P,i?), in our notation. Like the DEC-POMDP, the MMDP 
has no communication. In addition, the MMDP is a multiagent extension to the completely 
observable MDP model, so it assumes an environment that is individually observable. 
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Model 




Oa 


DEC-POMDP 


no communication 


collective partial observability 


POIPSG 


no communication 


collective partial observability 


MMDP 


no communication 


individual observability 


Xuan-Lesser 


general communication 


collective observability 



Table 1: Existing models as COM-MTDP subsets. 



2.4.4 Xuan-Lesser Framework 

The COM-MTDP's separation of communication from other actions is similar to previous 
work on multiagent decision models (Xuan, Lesser, & Zilberstein, 2001), which supported 
general communication. However, while the Xuan-Lesser model generalizes beyond indi- 
vidually observable environments, it supports only a subset of collectively observable envi- 
ronments. In particular, the Xuan-Lesser framework cannot represent agents who receive 
local observations of a common world state, where the observations of different agents could 
potentially be interdependent. 

3. COM-MTDP Complexity Analysis 

We can use the COM-MTDP model to prove some results about the complexity of con- 
structing optimal agent teams (i.e., teams that coordinate to produce optimal behavior in 
a problem domain). The problem facing these agents (or the designer of these agents) is 
how to construct the joint policies, tTqe and tToA^ so as to maximize their joint reward, 
as represented by the expected value, {'7TaA,''^aY:)- In all of the results presented, we 
assume that all of the values in a model instance (e.g., transition probabilities, rewards) are 
rational numbers, so that we can express the particular instance as a finite-sized input. 

Theorem 1 The decision problem of whether there exist policies, tt^e o.iT-d T^aA, for a given 
COM-MTDP, under general communication and collective partial observability, that yield 
a total reward at least K over some finite horizon T is NEXP-complete if \a\ > 2 (i.e., 
more than one agent). 

Proof: To prove that the COM-MTDP decision problem is NEXP-hard, we reduce a DEC- 
POMDP (Bernstein et al., 2000) to a COM-MTDP with no communication by copying 
all of the other model features from the given DEC-POMDP. In other words, if we are 
given a DEC-POMDP, {S,{A'}^^,P, {ir}^^,0,R), we can construct a COM-MTDP, 
{S',{A'^^^,,^',,P',{n'^r^„0'^,B',,R'), as follows: 

S' = S 

A[= 

E' = 

P'(s, (ai, ... , flm) , s') = P{s'\s, ai, . . . , a™) 
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B'- = Uj_^(ri*)-' (i-G-! observation sequences of length no more than the finite horizon) 
R'{s, («!,... , flm) , cr) = R{s, ai, . . . , am) 

The DEC-POMDP assumes perfect recall, so we use the state estimator functions from 
Equations 5 and 6. Since there is no communication for this COM-MTDP, we have a fixed 
silent policy, tTos. We can translate any domain- level policy, iTaA, into a DEC-POMDP 
joint policy, 6, as follows: 

d'{o\,... ,ol)^7r,A{{o\,... A)) (8) 

The expected utility of following this joint policy, 5, within the DEC-POMDP is identical 
to that of following tTqe and tt^a within the constructed COM-MTDP. Thus, there exists 
a policy with expected utility greater than K for the COM-MTDP if and only if there 
exists one for the DEC-POMDP. The decision problem for a DEC-POMDP is known to be 
NEXP-complete, so the COM-MTDP problem must be NEXP-hard. 

To show that the COM-MTDP is in NEXP, our proof proceeds similarly to that of 
the DEC-POMDP. In other words, we guess the joint policy, tTq, and write it down in 
exponential time (we assume that T < \S\). We can take the COM-MTDP plus the policy 
and generate (in exponential time) a corresponding MDP where the state space is the space 
of all possible combined belief states of the agents. We can then use dynamic programming 
to determine (in exponential time) whether tTq, generates an expected reward of at least K. 
□ 

In the remainder of this section, we examine the efi^ect of communication on the com- 
plexity of constructing team policies that generate optimal behavior. We start by examining 
the case under the condition of free communication, where we would expect the benefit of 
communication to be the greatest. To begin with, suppose that each agent is capable of 
communicating its entire observation (i.e., Sj D Qi). Before we analyze the complexity of 
the team decision problem, we first prove that the agents should exploit this capability and 
communicate their true observation, as long as they incur no cost in doing so: 

Theorem 2 Under free communication, consider a team of agents using a communication 
policy: 7rjs(&*,j]) = ^j- If the domain-level policy tt^a maximizes {TraA^'^aY:), then this 
combined policy is dominant over any other policies. In other words, for all policies, tt'^a 
andiz'^j., F^(7r„A, 7r„s) > ^^("^L^, "^Ls)- 

Proof: Suppose we have some other communication policy, tt^^,, that specifies something 
other than complete communication (e.g., keeping quiet, lying). Suppose that there is some 
domain- level policy, ttJ^ ^, that allows the team to attain some expected reward, K, when 
used in combination with tt'^.^.. Then, we can construct a domain- level policy, tTq,^, such 
that the team attains the same expected reward, K, when used in conjunction with the 
complete-communication policy, tTox;, as defined in the statement of Theorem 2. 

The communication policy, tt'^^., produces a different set of belief states (denoted fe' -.^^ 
and 6'^j],) than those for tt^s (denoted 6*,^, and h\^^). In particular, we use state estimator 
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functions, SE^^^, ^"^^ ^-^'iT,* defined in Equations 5 and 6 to generate b'\,Y, and 
Each belief state is a complete history of observation and communication pairs for each 
agent. On the other hand, under the complete communication of tTos, the state estimator 
functions of Equations 5 and 6 reduce to: 

SE,,^{{nl... ,fi*,-i>,j-2*)=(no,... ,0^-1,^7*) (9) 

= {nl...,sii-\ni) (10) 

Thus, TTaA is defined over a difi^erent set of belief states than tt'^^. In order to determine 
an equivalent TTa/i, we must first define a recursive mapping, m, that translates the belief 
states defined by tTqi; into those defined by tt^^.: 



\ \ jea I I 



= {^^{b'^l).{^iX{J'jl:{^^^^^ (11) 

Given this mapping, we then specify: T^iA{b\Y^,) = 7r^^(m.j(6*j.,)). Executing this domain- 
level policy, in conjunction with the communication policy, tTq-s, results in the identical 
behavior as execution of the alternate policies, tt^^ and tt^j.. Therefore, the team following 
the policies, tTq,^ and Tr^s will achieve the same expected value of if, as under tt^^ and 

Given this dominance of the complete-communication policy, we can prove that the 
problem of constructing teams that coordinate optimally is simpler when communication is 
free. 



Theorem 3 The decision problem of determining whether there exist policies, tTos and 
T^aA, for a given COM-MTDP with free communication under collective partial observabil- 
ity, that yield a total reward at least K over some finite horizon T is PSPACE-complete. 

Proof: To prove that the problem is PSPACE-hard, we reduce the single-agent POMDP to 
a COM-MTDP. In particular, if we are given a POMDP, {S, A, P, O., O, R), we can construct 
a COM-MTDP, {S',A[,T,[,P',n[,0[,B[,R'), for a single-agent team (i.e., a = {!}): 



S' = 


s 


A[ = 


A 


s; = 






P'(s,(ai),s') = P(s,ai,s') 

n[ = n 
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0[{s,{ai) ,{u}i)) = 0{s,ai,u}i) 

B[ = uJ_|(J7)-' (i.e., observation sequences of length no more than the finite horizon) 
R'a_{s, (ai)) = R{s,ai) 
R'^{s,<t) = 

This COM-MTDP satisfies our assumption of free communication. The POMDP assumes 
perfect recall, so we use the state estimator functions from Equations 5 and 6. Just as in 
the proof of Theorem 1, we can show that there exists a policy with expected utility greater 
than K for this COM-MTDP if and only if there exists one for the POMDP. The decision 
problem for the POMDP is known to be PSPACE-hard (Papadimitriou & Tsitsiklis, 1987), 
so the COM-MTDP problem under free communication must be PSPACE-hard. 

To show that the problem is in PSPACE, we take a COM-MTDP under free communi- 
cation and reduce it to a single-agent POMDP. In particular, if we are given a COM-MTDP, 
(5, Aa, Sa, P, fia, Oa, S^, R) , we can construct a single-agent POMDP, (5', A', P' , n', O', 
R'), as follows: 

S' = S 

A' = Aa 

P'{s,a,s') = P{s,a,s') 

n'= n„ 

0'{s,a,u}) = Oa{s,a,u}) 

R'{s,a) = RA{s,a) 

Prom Theorem 2, we need to consider only the complete-communication policy for the 
COM-MTDP and this policy has a zero reward. Therefore, the decision problem for the 
COM-MTDP is simply to find a domain-level policy that produces an expected reward 
exceeding K. Given full communication, the state estimator functions for the COM-MTDP 
(as shown in the proof of Theorem 2) reduce to Equation 10. A policy for our POMDP 
specifies an action for each and every history of observations: tt' : [jJ^i{Q'y A'. The 
history of observations for the single-agent POMDP corresponds to the belief states of our 
COM-MTDP under full communication. Therefore, we can translate a POMDP-policy, tt', 
into an equivalent domain-level policy for the COM-MTDP: 

7r^((c«;o,'«^i, ■ ■ ■ ,(^t)) = ir'iiivo.ivi,... ,u}t)) (12) 

A team following tta will perform the exact same domain-level actions as a single agent 
following tt'. Thus, there exists a policy with expected utility greater than K for the COM- 
MTDP if and only if there exists one for the POMDP. The decision problem for a POMDP 
is known to be in PSPACE (Papadimitriou & Tsitsiklis, 1987), so the COM-MTDP problem 
(under free communication) must be in PSPACE as well. □ 
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Theorem 4 The decision problem of determining whether there exist policies, tt^s and 
''^aA, for a given COM-MTDP with free communication and collective observability, that 
yield a total reward at least K over some finite horizon T is P-complete. 

Proof: The proof follows that of Theorem 3, but with a reduction to and from the MDP 
decision problem, rather than the POMDP. The MDP decision problem is P-complete (Pa- 
padimitriou k Tsitsiklis, 1987). □ 

Theorem 5 The decision problem of determining whether there exist policies, tTos and 
T^aA, for a given COM-MTDP with individual observability, thai yield a total reward at 
least K over some finite horizon T (given integers K and T) is P-complete. 

Proof: The proof follows that of Theorem 4, except that we can reduce the problem to 
and from an MDP regardless of what communication policy the team uses. □ 

Theorem 6 The decision problem of determining whether there exist policies, tt^y, and 
T^aA, for a given COM-MTDP with non-observability, that yield a total reward at least K 
over some finite horizon T (given integers K and T) is NP-complete. 

Proof: The proof follows that of Theorem 4, except that we can reduce the problem to and 
from an single-agent non-observable MDP (NOMDP) regardless of what communication 
policy the team uses. In particular, because the agents are all equally ignorant of the state, 
communication has no effect. The NOMDP decision problem is NP-complete (Papadim- 
itriou k Tsitsiklis, 1987). □ 

Thus, we have used the COM-MTDP framework to characterize the difficulty of problem 
domains in agent teamwork along the dimensions of communication cost and observability. 
Table 2 summarizes our results, which we can use in deciding where to concentrate our 
energies in attacking teamwork problems. We can use these results to draw some conclusions 
about the challenges to designers of multiagent teams: 

• The greatest challenges lie in those domains with either collective observability or 
collective partial observability and with nonzero communication cost. 

• Under collective observability and collective partial observability, teamwork without 
communication is highly intractable, but, with free communication, the complexity 
becomes on par with that of single-agent planning problems. 

• Agent team designers have much to gain by increasing the observational capabilities of 
their team (e.g., by adding new sensor agents) because of the reduction in complexity 
gained by making the domain collectively observable. 

• Furthermore, the results from Theorems 3 and 4 hold in any domain where the result 
from Theorem 2 holds (i.e., when complete communication is the dominant policy). 
Therefore, while perfectly free communication may be rare, these results show that 
investment in communication in teamwork can pay off with a significant simplification 
of optimal teamwork. 
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Individually 
Observable 


Collectively 
Observable 


Collectively 
Partially Observable 


Non- 
Observable 


No Comm. 


P-complete 


NEXP-complete 


NEXP-complete 


NP-Complete 


General Comm. 


P-complete 


NEXP-complete 


NEXP-complete 


NP-Complete 


Free Comm. 


P-complete 


P-complete 


PSPACE-complete 


NP-Complete 



Table 2: Time complexity of COM-MTDPs. 



• On the other hand, when the world is individually observable or non-observable, com- 
munication makes no difference in performance. 

• It should be noted that even under those conditions where the problem is P-complete, 
the complexity of optimal teamwork is polynomial in the number of states of the 
world, which may still be impractically high. 

• The above complexity results pertain to finding policies that are optimal subject to 
the domain properties. We will find different expected rewards of the optimal policies 
under different observability and communication properties. For instance, cutting off 
all of the agents' sensors makes the domain non-observable and reduces the complexity 
of generating an optimal policy from NEXP to NP, but we would expect an associated 
drop in the expected reward achieved by the team. 

4. Evaluating Team Coordination 

Table 2 shows that providing optimal domain-level and communication policies for teams is 
a difficult challenge. Many systems alleviate this difficulty by having domain experts pro- 
vide the domain- level plans (Tambe, 1997; Tidhar, 1993). Then, the problem for the agents 
reduces to generating the appropriate team coordination, nas, to ensure that they prop- 
erly execute the domain-level plans, TTaA- In this section, we demonstrate the COM-MTDP 
framework's ability to analyze existing teamwork approaches in the literature. Our method- 
ology for such analysis begins by encoding such a teamwork method as a communication- 
level policy. In other words, we translate the method into an algorithm that maps agent 
beliefs (e.g., observation sequences) into communication decisions. To evaluate the per- 
formance of this policy, we then instantiate a COM-MTDP that represents the states, 
transition probabilities, and reward function of a domain of interest. Our methodology 
provides an evaluation of the policy in terms of the expected reward earned by the team 
when following the policy in the specified domain. 

We demonstrate this methodology by using our COM-MTDP framework to analyze joint 
intentions theory (Cohen &: Levesque, 1991b, 1991a; Levesque et al., 1990), which provides 
a common basis for many existing approaches to team coordination. Section 4.1 models two 
key instantiations of joint intentions taken from the literature (Jennings, 1995; Tambe, 1997) 
as COM-MTDP communication policies. Section 4.2 analyzes the conditions under which 
these policies generate optimal behavior and provides a third candidate policy that makes 
communication decisions that are locally optimal within the context of joint intentions. In 
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addition to providing the results for the particular team coordination strategies investigated, 
this section also illustrates a general methodology by which one can use our COM-MTDP 
framework to encode and evaluate coordination strategies proposed by existing multiagent 
research. 

4.1 Joint Intentions in a COM-MTDP 

Joint intention theory provides a prescriptive framework for multiagent coordination in a 
team setting. It does not make any claims of optimality in its teamwork, but it provides 
theoretical justifications for its prescriptions, grounded in the attainment of mutual belief 
among the team members. We can use the COM-MTDP framework to identify the domain 
properties under which attaining mutual belief generates optimal behavior and to quantify 
precisely how suboptimal the performance will be otherwise. 

Joint intentions theory requires that team members jointly commit to a joint persistent 
goal, G. It also requires that when any team member privately believes that G is achieved 
(or unachievable or irrelevant), it must then attain mutual belief throughout the team 
about this achievement (or unachievability or irrelevance). To encode this prescription of 
joint intentions theory within our COM-MTDP model, we first specify the joint goal, G, as 
a subset of states, G C S, where the desired goal is achieved (or unachievable or irrelevant). 

Presumably, such a prescription indicates that joint intentions are not specifically in- 
tended for individually observable environments. Upon achieving the goal in an individually 
observable environment, each agent would simultaneously observe that 5* G G. Because 
of our assumption that the COM-MTDP model components (including Oa) are common 
knowledge to the team, each agent would also simultaneously come to believe that its team- 
mates have observed that G G, and that its teammates believe that it believes that all 
of the team members have observed that 5* G G, and so on. Thus, the team immediately 
attains mutual belief in the achievement of the goal under individual observability without 
any additional communication necessary by the team. 

Instead, the joint intention framework aims at domains with some degree of unobserv- 
ability. In such domains, the agents must signal the other agents, either through communi- 
cation or some informative domain-level action, to attain mutual belief. However, we can 
also assume that joint intention theory does not focus on domains with free communication, 
where Theorem 2 shows that we can simply have the agents communicate everything, all 
the time, without the need for more complex prescriptions. 

The joint intention framework does not specify a precise communication policy for the 
attainment of mutual belief. In this paper, we focus on communication only in the case of 
goal achievement, but our methodology extends to handle unachievability and irrelevance as 
well. One well-known approach (Jennings, 1995) applied joint intentions theory by having 
the agents communicate the achievement of the joint goal, G, as soon as they believe G to be 
true. To instantiate the behavior of Jennings' agents within a COM-MTDP, we construct a 
communication policy, tt^j., that specifies that an agent sends the special message, ao, when 
it first believes that G holds. Following joint intentions' assumption of sincerity (Smith & 
Cohen, 1996), we require that the agents never select the special ctg message in a belief 
state unless they believe G to be true with certainty. With this requirement and with our 
assumption of the team's common knowledge of the communication model, we can assume 
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that all of the other agents immediately accept the special message, ctg, as true, and that 
the agents know that all their team members accept the message as true, and so on. Thus, 
the team attains mutual belief that G is true immediately upon receiving the message, gq- 
We can construct the communication policy, tt^j., in constant time. 

The STEAM algorithm is another instantiation of joint intentions that has had success in 
several real- world domains (Tambe, 1997; Pynadath et al., 1999; Tambe, Pynadath, Chau- 
vat. Das, & Kaminka, 2000; Pynadath & Tambe, 2002). Unlike Jennings' instantiation, the 
STEAM teamwork model includes decision-theoretic communication selectivity. A domain 
specification includes two parameters for each joint commitment, G: r, the probability of 
miscoordinated termination of G; and C„ji, the cost of miscoordinated termination of G. In 
this context, "miscoordinated termination" means that some agents immediately observe 
that the team has achieved G while the rest do not. STEAM's domain specification also 
includes a third parameter, Cc, to represent the cost of communication of a fact (e.g., the 
achievement of G). Using these parameters, the STEAM algorithm evaluates whether the 
expected cost of miscoordination outweighs the cost of communication. STEAM expresses 
this criterion as the following inequality: r • Cmt > Cc- We can define a communication 
policy, TT^j; based on this criterion: if the inequality holds, then an agent that has observed 
the achievement of G will send the message, ac; otherwise, it will not. We can construct 
TT^ji in constant time. 

4.2 Locally Optimal Policy 

Although the STEAM policy is more selective than Jennings', it remains unanswered 
whether it is optimally selective, and researchers continue to struggle with the question 
of when agents should communicate (Yen et al., 2001). The few reports of suboptimal 
(in particular, excessive) communication in STEAM characterized the phenomenon as an 
exceptional circumstance, but it is also possible that STEAM's optimal performance is the 
exception. We use the COM-MTDP model to derive an analytical characterization of opti- 
mal communication here, while Section 5 provides an empirical one by creating an algorithm 
using that characterization. 

Both policies, t^'^-^, and tt'^j, consider sending ac only when an agent first believes that 
G has been achieved. Once an agent has the relevant belief, they make different choices, and 
we consider here what the optimal decision is at this point. The domain is not individually 
observable, so certain agents may be unaware of the achievement of G. When not sending 
the ac message, these unaware agents may unnecessarily continue performing actions in 
the pursuit of achieving G. The performance of these extraneous actions could potentially 
incur costs and lead to a lower utility than one would expect when sending the ac; message. 

The decision to send or not matters only if the team achieves G and one agent 
comes to know this fact. We define the random variable, Tg, to be the earliest time at 
which an agent knows this fact. We denote agent Kq as the agent who knows of the 
achievement at time Tc. If Kg = i, for some agent, «, and Tc = to, then agent i has some 
pre-communication belief state, 6*°^. = /3, that indicates that G has been achieved. To more 
precisely quantify the difference between agent i sending the ctg message at time Tg vs. 
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never sending it, we define the following value: 



A^(io,i,/3) =E 



E 



T-to 
T-to 



E*° = null, Tg = to, Kg = i, = 



(13) 



We assume that, for all times other than Tq, the agents follow some communication policy, 
TTas, that never specifies ctg. Thus, measures the difference in expected reward that 
hinges on agent i's specific decision to send or not send at time to. Given this definition, 
it is locally optimal for agent i to send the special message, ctg, at time to, if and only 
if > 0. We define the communication policy, TTas+cr, as the communication policy 
following TTas for all agents at all times, except for agent i under belief state /3, when 
agent i sends message a. With this definition, TTas+cro, is the policy under which agent i 
communicates the achievement of G, and ■'rQ,j._|_Q^22 is the policy under which it does not. 
Therefore, we can alternatively describe agent decision criterion as choosing TTas+aa 
over '^aY.+mill ^^'^ only if A^ > 0. 

Unfortunately, while Equation 13 identifies an exact criterion for locally optimal commu- 
nication, this criterion is not yet operational. In other words, we can not directly implement 
it as a communication policy for the agents. Furthermore, Equation 13 hides the underly- 
ing complexity of the computation involved, which is one of the key goals of our analysis. 
Therefore, we use the COM-MTDP model to derive an operational expression of A^ > 0. 
For simplicity, we define notational shorthand for various sequences and combinations of 
values. We define a partial sequence of random variables, X^*, to be the sequence of ran- 
dom variables for all times before t: X^, X^, . . . , X^^^. We make similar definitions for the 
other relational operators (i.e., X^^, X-*, etc.). The expression, (5)^, denotes the cross 
product over states of the world, O^o distinguished from the time-indexed random 
variable, Sf^ , which denotes the value of the state at time T. The notation, specifies 
the element in slot t within the vector We define the function, T, as shorthand within 

our probability expressions. It allows us to compactly represent a particular subsequence 
of world and agent belief states occurring, conditioned on the current situation, as follows: 



Pr (T((i,i'>,5,/3.s)) =Pr(5 



>t,<t' 



^1 



>t,<t' 



/^.E KG = to, Kg 



(14) 



Informally, T {{t,t') ,5,/3,j^) represents the event that the world and belief states from time 
t through t' correspond to the specified sequences, s and /3,-£, respectively, conditioned on 
agent i being the first to know of G's achievement at time to with a belief state, (3. We define 
the function, /3s., to map a pre- communication belief state into the post-communication 
belief state that arises from a communication policy: 



/3s.(/3,s, TTas) = S'i?Q,s,(/3,g,7rQ,s(/3,g)) 



(15) 



This definition of is a well-defined function because of the deterministic nature of the 
policy, TTaS: ^nd statc-cstimator function, SEa^,. 
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Theorem 7 If we assume that, upon achievement of G, no communication other than gq 
is possible, then the condition A'^{to,i,f3) > holds if and only if: 



J2 $]Pr(T((0,io),s^*°,/3.-^°)) 

f 

■ J2 5]Pr(T((io,T),.^*°,^.^^<') \^l"=aG,r{{0,to),s^'",pf^°)) 

T 

■J2Ra {s^'°[t],Tv^A (/^s. (/3.s^*°M,7r„s+.«))) 

t=to 

- E ^Pr(T((io,T),s^'°,/3.^^°) Is*" = n«//,T((0,io),s^*°,/3.^^°)) 

>-E E PH^{{to,to),s,f3))Rs(s,aG) 



(16) 



Proof: The complete proof of the following theorem appears in Online Appendix 1. 
The definition of A-^ in Equation 13 is the difference between two expectations, where each 
expectation is a sum over the possible trajectories of the agent team. Each trajectory must 
includes a sequence of possible world states, since the agents' reward at each point in time 
depends on the particular state of the world at that time. The agents' reward also depends 
on their actions (both domain- and communication- level) . These actions are deterministic, 
given the agents' policies, TTaA and tts, and their belief states. Thus, in addition to summing 
over the possible states of the world, we must also sum over the possible states of the agents' 



407 



Pynadath & Tambe 



beliefs (both pre- and post-communication): 

A^(io,i,/3) 

= E E E Pr(5^^ = .^^,b.s^^ = /3.^^^,bs.^^ = /3^.^^ 

T 

-EE E Pr(5^^ = .^^,b.s^^ = /3.^^^,bs.^^ = /3^.^^ 

|E*° = null, Tg = io, i^G = = /5) 

T 

•^i?(.^^[i],7r^(/3s.^^[i]),7rs(/3.s^^[i])) (17) 

We can rewrite these summations more simply using our various shorthand notations: 
= Yl E Pr(T((0,T),.,^.s^^)|S*°=aG) 

S<T£(S)T^.^<Tg(B)T 

T 

■5]i?(5^^[i],7r^(/3s.(/3.s^^[i],7rs.J),7rs.«(/3.s^^[i])) 
- E Pr(T((0,T),.,^.s<^)|E*''=null) 

T 

■ 5]i?(5^^[i],7r^(/3s.(/3.s^^[i],7r^^^ll))''^Enull(/3.s-^W)) (18) 

The remaining derivation exploits our Markovian assumptions to rearrange the summations 
and cancel like terms to produce the theorem's result. □ 

Theorem 7 states, informally, that we prefer sending ac whenever the the cost of exe- 
cution after achieving G outweighs the cost of communication of the fact that G has been 
achieved. More precisely, the outer summations on the left-hand side of the inequality 
iterate over all possible past histories of world and belief states, producing a probability 
distribution over the possible states the team can be in at time to- For each such state, the 
expression inside the parentheses computes the difference in domain-level reward, over all 
possible future sequences of world and belief states, between sending and not sending ac- 
By our theorem's assumption that no communication other than is possible after G has 
been achieved, we can ignore any communication costs in the future. However, if we relax 
this assumption, we can extend the left-hand side in a straightforward manner into a longer 
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Table 3: Time complexity of locally optimal decision. 



expression that accounts for the difference in future communication costs as well. Thus, the 
left-hand side captures our intuition that, when not communicating, the team will incur a 
cost if the agents other than i are unaware of G's achievement. The right-hand side of the 
inequality is a summation of the cost of sending the ctg message over possible current states 
and belief states. 

We can use Theorem 7 to derive the locally optimal communication decision across 
various classes of problem domains. Under no communication, we cannot send ao- Under 
free communication, the right-hand side is 0, so the inequality is always true, and we know 
to prefer sending a^. Under no assumptions about communication, the determination is 
more complicated. When the domain is individually observable, the left-hand side becomes 
0, because all of the agents know that G has been achieved (and thus there is no difference 
in execution when sending ctg). Therefore, the inequality is always false (unless under free 
communication), and we prefer not sending a^. When the environment is not individually 
observable and communication is available but not free, then, to be locally optimal at time 
to, agent i must evaluate Inequality 16 in its full complexity. Since the inequality sums 
rewards over all possible sequences of states and observations, the time complexity of the 
corresponding algorithm is ©((IS"! • |fiQ,|)^). While this complexity is unacceptable for most 
real- world problems, it still provides an exponential savings over searching the entire policy 
space for the globally optimal policy, where any agent could potentially send at times 
other than Tg- Table 3 provides a table of the complexity required to determine the locally 
optimal policy under the various domain properties. 

We can now show that although Theorem 7's algorithm for locally optimal communica- 
tion provides a significant computational savings over finding the global optimum, it still 
outperforms existing teamwork models, as exemplified by our tt^j. and tt^j. policies. First, 
we can use the criterion of Theorem 7 to evaluate the optimality of the policy, tt^j.. If 

{to,i, P) > for all possible times to, agents i, and belief states /3 that are consistent 
with the achievement of the goal G, then the locally optimal policy will always specify 
sending ac- In other words, tt^j. will be identical to the locally optimal policy. However, 
if the inequality of Theorem 7 is ever false, then tt^j. is not even locally, let alone globally, 
optimal. 

Second, we can also use Theorem 7 to evaluate STEAM by viewing STEAM's inequality, 
T ■ Cmt > Cc, as a crude approximation of Inequality 16. In fact, there is a clear corre- 
spondence between the terms in the two inequalities. The left-hand side of Inequality 16 
computes an exact expected cost of miscoordination. However, unlike STEAM's monolithic 
T parameter, the optimal criterion evaluates a complete probability distribution over all 
possible states of miscoordination by considering all possible past sequences consistent with 
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the agent's current beliefs. Likewise, unlike STEAM's monolithic Cmt parameter, the opti- 
mal criterion looks ahead over all possible future sequences of states to determine the true 
expected cost of miscoordination. Furthermore, we can view STEAM's parameter, Cg, as an 
approximation of the communication cost computed by the right-hand side of Inequality 16. 
Again, STEAM uses a single parameter, while the optimal criterion computes an expected 
cost over all possible states of the world. 

STEAM does have some flexibility in its representation, because Cmt-, t, and Cg are 
not necessarily fixed across the entire domain. For instance, Cmt may vary based on the 
specific joint plan that the agents may have jointly committed to (i.e., there may be a 
different C„jt for each goal G). Thus, while Theorem 7 suggests significant additional flexi- 
bility in computing Cmt through explicit lookahead, the optimal criterion derived with the 
COM-MTDP model also provides a justification for the overall structure behind STEAM's 
approximate criterion. Furthermore, STEAM's emphasis on on-line computation makes the 
computational complexity of Inequality 16 (as presented in Table 3) unacceptable, so the 
approximation error may be acceptable given the gains in efficiency. For a specific domain, 
we can use empirical evaluation (as demonstrated in the next section) to quantify the error 
and efficiency to precisely judge this tradeoff. 

5. Empirical Policy Evaluation 

In addition to providing these analytical results over general classes of problem domains, the 
COM-MTDP framework also supports the analysis of specific domains. Given a particular 
problem domain, we can construct an optimal communication policy or, if the complexity of 
computing an optimal policy is prohibitive, we can instead evaluate and compare candidate 
approximate policies. To provide a reusable tool for such evaluations, we have implemented 
the COM-MTDP model as a Python class with domain-independent methods for the eval- 
uation of arbitrary policies and for the generation of both locally optimal policies using 
Theorem 7 and globally optimal policies through brute-force search of the policy space. 
This software is available in Online Appendix 1. 

This section presents results of a COM-MTDP analysis of an example domain involving 
agent-piloted helicopters, where we focus on the key communication decision faced by many 
multiagent frameworks (as described in Section 4), but vary the cost of communication and 
degree of observability to generate a space of distinct domains with different implications 
for the agents' performance. By evaluating communication policies over various configura- 
tions of this particular testbed domain, we demonstrate a methodology by which one can 
use the COM-MTDP framework to model any problem domain and to evaluate candidate 
communication policies for it. 

5.1 Experimental Setup 

Consider two helicopters that must fly across enemy territory to their destination, as il- 
lustrated in Figure 1. The first, piloted by agent Transport, is a transport vehicle with 
limited firepower. The second, piloted by agent Escort, is an escort vehicle with significant 
firepower. Somewhere along their path is an enemy radar unit, but its location is unknown 
(a priori) to the agents. Escort is capable of destroying the radar unit upon encountering 
it. However, Transport is not, but it can escape detection by the radar unit by traveling 
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Figure 1: Illustration of helicopter team scenario. 



at a very low altitude {nap- of -the- earth flight), though at a lower speed than at its typical, 
higher altitude. In this scenario. Escort will not worry about detection, given its superior 
flrepower; therefore, it will fly at a fast speed at its typical altitude. 

The two agents form a top-level joint commitment, Gd, to reach their destination. 
There is no incentive for the agents to communicate the achievement of this goal, since they 
will both eventually reach their destination with certainty. However, in the service of their 
top-level goal. Go, the two agents also adopt a joint commitment, Gr, of destroying the 
radar unit. We consider here the problem facing Escort with respect to communicating the 
achievement of goal, Gr. If Escort communicates the achievement of Gr, then Transport 
knows that it is safe to fly at its normal altitude (thus reaching the destination sooner). 
If Escort does not communicate the achievement of Gr, there is still some chance that 
Transport will observe the event anyway. If Transport does not observe the achievement 
of Gr, then it must fly nap-of-the-earth the whole distance, and the team receives a lower 
reward because of the later arrival. Therefore, Escort must weigh the increase in expected 
reward against the cost of communication. 

In the COM-MTDP model of this scenario (presented in Figures 2, 3 and 4), the world 
state is the position (along a straight line between origin and destination) of Transport, 
Escort, and the enemy radar. The enemy is at a randomly selected position somewhere 
in between the agents' initial position and their destination. Transport has no possible 
communication actions, but it can choose between two domain-level actions: flying nap-of- 
the-earth and flying at its normal speed and altitude. Escort has two domain-level actions: 
flying at its normal speed and destroying the radar. Escort also has the option of communi- 
cating the special message, ctg^ , indicating that the radar has been destroyed. In the tables 
of Figures 2, 3 and 4, the "•" symbol represents a wild-card (or "don't care") entry. 

If Escort arrives at the radar, then it observes its presence with certainty and can 
destroy it to achieve Gr. The likelihood of Transport's observing the radar's destruction is 
a function of its distance from the radar. We can vary this function's observability parameter 



411 



Pynadath & Tambe 



a 


= {Escort Transport (T)} 






b 


= X St X S 


R 








Position of Escort: Ee = {0, 1, . . . , 


8, 9, Destination} 




Position of Transport: Et = {0, 0.5, . . 


, 9, 9.5, Destination, 






Destroyed} 




Position of Radar: S/? = {1, 2, . . . , 


8, Destroyed} 


A 


= Ae X At = {fly, destroy, wait} x {fly-NOE, fly- normal, wait} 




= Sfi X = {clear (o-G^),null} x {null} 










a 


Ra 




0,... ,9 


0, . . . , 9.5, Destroyed 









= 0,...,9 


Destination 








Destination 


0. . . . ,9.5. Destroyed 




te 




Destination 


Destination 




1'E + 1'T 


Es(s, (null, null)) 


= 








-Rs(s, (cTG^, null)) 


= -rs G [0, 1] 









Figure 2: COM-MTDP model of states, actions, and rewards for helicopter scenario. 



(A in Figure 4) within the range [0, 1] to generate distinct domain configurations (0 means 
that Transport will never observe the radar's destruction; 1 means Transport will always 
observe it). If the observability is 1, then they achieve mutual belief of the achievement of 
Gr as soon as it occurs (following the argument presented in Section 4.1). However, for any 
observability less than 1, there is a chance that the agents will not achieve mutual belief 
simply by common observation. The helicopters receive a fixed reward for each time step 
spent at their destination. Thus, for a fixed time horizon, the earlier the helicopters reach 
there, the greater the team's reward. Since flying nap-of-the-earth is slower than normal 
speed. Transport will switch to its normal flying as soon as it either observes that Gji has 
been achieved or Escort sends the message, <jgr- Sending the message is not free, so we 
impose a variable communication cost (rs in Figure 2), also within the range [0, 1]. 

We constructed COM-MTDP models of this scenario for each combination of observabil- 
ity and communication cost within the range [0, 1] at 0.1 increments. For each combination, 
we applied the Jennings and STEAM policies, as well as a completely silent policy. For this 
domain, the policy, tt^j., dictates that Escort always communicate ctg^ upon destroying 
the radar. For STEAM, we vary the r and Cc parameters with the observability and com- 
munication cost parameters, respectively. We used two difl^erent settings [low and medium) 
for the cost of miscoordination, Cmt- Following the published STEAM algorithm (Tambe, 
1997), Escort sends message ctg^ if and only if STEAM's inequality r • Cmt > C'c, holds. 
Thus, the two different settings, low and medium, for Cmt generate two distinct communica- 
tion policies; the high setting is strictly dominated by the other two settings in this domain. 
We also constructed and evaluated locally and globally optimal policies. In applying each 
of these policies, we used our COM-MTDP model to compute the expected reward received 
by the team when following the selected policy. We can uniquely determine this expected 
reward given the candidate communication policy and the particular observability and com- 
munication cost parameters, as well as the COM-MTDP model specified in Figures 2, 3, 
and 4. 
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Figure 3: COM-MTDP model of transition probabilities for helicopter scenario (excludes 
zero probability rows). 
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Figure 5: Suboptimality of silent and Jennings policies. 




Figure 6: Suboptimality of STEAM policy under both low and medium costs of miscoordi- 
nation. 



5.2 Experimental Results 

Figures 5 and 6 plot how much utility the team can expect to lose by following the Jennings, 
silent, and the two STEAM policies instead of the locally optimal communication policy 
(thus, higher values mean worse performance). We can immediately see that the Jennings 
and silent policies are significantly suboptimal for many possible domain configurations. For 
example, not surprisingly, the surface for the policy, tt^j., peaks (i.e., it does most poorly) 
when the communication cost is high and when the observability is high, while the silent 
policy does poorly under exactly the opposite conditions. 

Previously published results (Jennings, 1995) demonstrated that the Jennings policy 
led to better team performance by reducing waste of effort produced by alternate policies 
like our silent one. These earlier results focused on a single domain, and Figure 5 partially 
confirms their conclusion and shows that the superiority of the Jennings policy over the 
silent policy extends over a broad range of possible domain configurations. On the other 
hand, our COM-MTDP results also show that there is a significant subclass of domains (e.g., 
when communication cost and observability are high) where the Jennings policy is actually 
inferior to the silent policy. Thus, with our COM-MTDP model, we can characterize the 
types of domains where the Jennings policy outperforms the silent policy and vice versa. 
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Figure 6 shows the expected value lost by following the two STEAM policies. We can 
view STEAM as trying to intelligently interpolate between the Jennings and silent policies 
based on the particular domain properties. In fact, under a low setting for C^i, we see 
two thresholds, one along each dimension, at which STEAM switches between following the 
Jennings and silent policies, and its suboptimality is highest at these thresholds. Under 
a medium setting for Cmt, STEAM does not exhibit a threshold along the dimension of 
communication cost, due to the increased cost of miscoordination. Under both settings, 
steam's performance generally follows the better of those two fixed policies, so its maxi- 
mum suboptimality (0.587 under both settings) is significantly lower than that of the silent 
(0.700) and Jennings' (1.000) policies. Furthermore, STEAM outperforms the two policies 
on average, across the space of domain configurations, as evidenced by its mean subopti- 
mality of 0.063 under low Cmt and 0.083 under medium Cmt- Both values are significantly 
lower than the silent policy's mean of 0.160 and the Jennings' policy's mean of 0.161. Thus, 
we have been able to quantify the savings provided by STEAM over less selective policies 
within this example domain. 

However, within a given domain configuration, STEAM must either always or never 
communicate, and this inflexibility leads to significant suboptimality across a wide range 
of domain configurations. On the other hand, Figure 6 also shows that there are domain 
configurations where STEAM is locally optimal. In this relatively small-scale experimental 
testbed, there is no need to incur STEAM's suboptimality, because the agents can compute 
the superior locally optimal policy in under 5 seconds. In larger-scale domains, on the other 
hand, the increased complexity of the locally optimal policies may render its execution 
infeasible. In such domains, STEAM's constant-time execution would potentially make it a 
preferable alternative. This analysis suggests a possible spectrum of algorithms that make 
different optimality-efficiency tradeoffs. 

To understand the cause of STEAM's suboptimality, we can examine its performance 
more deeply in Figures 7 and 8, which plot the expected number of messages sent using 
STEAM (with both low and medium Cmt) vs. the locally optimal policy, at observability 
values of 0.3 and 0.7. STEAM's expected number of messages is either or 1, so STEAM 
can make at most two (instantaneous) transitions between them: one threshold value each 
along the observability and communication cost dimensions. 

From Figures 7 and 8, we see that the optimal policy can be more flexible than STEAM 
by specifying communication contingent on Escorts beliefs beyond simply the achievement 
of Gn- For example, consider the messages sent under low Cmt in Figure 7, where STEAM 
matches the locally optimal policy at the extremes of the communication cost dimension. 
Even if the communication cost is high, it is still worth sending message ggr in states where 
Transport is still very far from the destination. Thus, the surface for the optimal policy, 
makes a more gradual transition from always communicating to never communicating. We 
can thus view STEAM's surface as a crude approximation to the optimal surface, subject 
to STEAM's fewer degrees of freedom. 

We can also use Figures 7 and 8 to identify the domain conditions under which joint 
intentions theory's prescription of attaining mutual belief is or is not optimal. In particular, 
for any domain where the observability is less than 1, the agents will not attain mutual belief 
without communication. In both Figures 7 and 8, there are many domain configurations 
where the locally optimal policy is expected to send fewer than 1 ctg^ message. Each of 
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Figure 7: Expected number of messages sent by STEAM and locally optimal policies when 
the observability is 0.3. 




Figure 8: Expected number of messages sent by STEAM and locally optimal policies when 
the observability is 0.7. Under both settings, STEAM sends messages. 
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Figure 9: Suboptimality of locally optimal policy. 

these configurations represents a domain where the locally optimal policy will not attain 
mutual belief in at least one case. Therefore, attaining mutual belief is suboptimal in those 
configurations! 

These experiments illustrate that STEAM, despite its decision-theoretic communication 
selectivity, may communicate suboptimally under a significant class of domain configura- 
tions. Previous work on STEAM-based, real-world, agent-team implementations informally 
noted suboptimality in an isolated configuration within a more realistic helicopter trans- 
port domain (Tambe, 1997). Unfortunately, this previous work treated that suboptimality 
(where the agents communicated more than necessary) as an isolated aberration, so there 
was no investigation of the degree of such suboptimality, nor of the conditions under which 
such suboptimality may occur in practice. We re-created these conditions within the experi- 
mental testbed of this section by using a medium Cmt- The resulting experiments (as shown 
in Figure 7) illustrated that the observed suboptimality was not an isolated phenomenon, 
but, in fact, that STEAM has a general propensity towards extraneous communication in 
situations involving low observability (i.e., low likelihood of mutual belief) and high com- 
munication costs. This result matches the situation where the "aberration" occurred in the 
more realistic domain. 

The locally optimal policy is itself suboptimal with respect to the globally optimal 
policy, as we can see from Figure 9. Under domain configurations with high observability, 
the globally optimal policy has the escort wait an additional time step after destroying 
the radar and then communicate only if the transport continues fiying nap-of-the-earth. 
The escort cannot directly observe which method of fiight the transport has chosen, but 
it can measure the change in the transport's position (since it maintains a history of its 
past observations) and thus infer the method of fiight with complete accuracy. In a sense, 
the escort following the globally optimal policy is performing plan recognition to analyze 
the transport's possible beliefs. It is particularly noteworthy that our domain specification 
does not explicitly encode this recognition capability. In fact, our algorithm for finding the 
globally optimal policy does not even make any of the assumptions made by our locally 
observable policy (i.e., single agent is deciding whether to communicate or not, regarding 
a single message, at a single point in time); rather, our general-purpose search algorithm 
traverses the policy space and "discovers" this possible means of inference on its own. We 
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expect that such COM-MTDP analysis can provide an automatic method for discovering 
novel communication policies of this type in other domains, even those modeling real-world 
problems. 

Indeed, by exploiting this discovery capability within our example domain, the globally 
optimal policy gains a slight advantage in expected utility over the locally optimal policy, 
with a mean difference of 0.011, standard deviation of 0.027, and maximum of 0.120. On the 
other hand, our domain-independent code never requires more than 5 seconds to compute 
the locally optimal policy in this testbed, while our domain-independent search algorithm 
always required more than 150 minutes to find the globally optimal policy. Thus, through 
Theorem 7, we have used the COM-MTDP model to construct a communication policy 
that, for this testbed domain, performs almost optimally and outperforms existing team- 
work theories, with a substantial computational savings over finding the globally optimal 
policy. Although these results hold for an isolated communication decision, we expect the 
relative performance of the policies to stay the same even with multiple decisions, where the 
inflexibility of the suboptimal policies will only exacerbate their losses (i.e., the shapes of 
the graphs would stay roughly the same, but the suboptimality magnitudes would increase). 

6. Summary 

The COM-MTDP model is a novel framework that complements existing teamwork research 
by providing the previously lacking capability to analyze the optimality and complexity of 
team decisions. While grounded within economic team theory, the COM-MTDP's exten- 
sions to include communication and dynamism allow it to subsume many existing multiagent 
models. We were able to exploit the COM-MTDP's ability to represent broad classes of 
multiagent team domains to derive complexity results for optimal agent teamwork under 
arbitrary problem domains. We also used the model to identify domain properties that can 
simplify that complexity. 

The COM-MTDP framework provides a general methodology for analysis across both 
general domain subclasses and specific domain instantiations. As demonstrated in Section 4, 
we can express important existing teamwork theories within a COM-MTDP framework and 
derive broadly applicable theoretical results about their optimality. Section 5 demonstrates 
our methodology for the analysis of a specific domain. By encoding a teamwork problem as 
a COM-MTDP, we can use the leverage of our general-purpose software tools (available in 
Online Appendix 1) to evaluate the optimality of teamwork based on potentially any other 
existing theory, as demonstrated in this paper using two leading instantiations of joint 
intentions theory. In combining both theory and practice, we can use the theoretical results 
derived using the COM-MTDP framework as the basis for new algorithms to extend our 
software tools, just as we did in translating Theorem 7 from Section 4 into an implemented 
algorithm for locally optimal communication in Section 5. We expect that the COM-MTDP 
framework, the theorems and complexity results, and the reusable software will form a basis 
for further analysis of teamwork, both by ourselves and others in the field. 
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7. Future Work for COM-MTDP Team Analysis 

While our initial COM-MTDP results are promising, there remain at least three key areas 
where future progress in COM-MTDPs is critical. First, analysis using COM-MTDPs (such 
as the one presented in Section 5) requires knowledge of the rewards, transition probabil- 
ities, and observation probabilities, as well as of the competing policies governing agent 
behavior. It may not always be possible to have such a model of the domain and agents' 
policies readily available. Indeed, other proposed team-analysis techniques (Nair, Tambe, 
Marsella, &: Raines, 2002b; Raines, Tambe, & Marsella, 2000), do not require a priori hand- 
coding of such models, but rather acquire them automatically through machine learning 
over large numbers of runs. Also, in the interests of combating computational complexity 
and improved understandability, some researchers emphasize the need for multiple models 
at multiple levels of abstraction, rather than focusing on a single model (Nair et al., 2002b). 
For instance, one level of the model may focus on the analysis of the individual agents' ac- 
tions in support of a team, while another level may focus on interactions among subteams 
of a team. We can potentially extend the COM-MTDP model in both of these directions 
(i.e., machine learning of model parameters, and hierarchical representations of the team to 
provide multiple levels of abstraction). 

Second, it is important to extend COM-MTDP analysis to other aspects of teamwork 
beyond communication. For instance, team formation (where agents may be assigned spe- 
cific roles within the team) and reformation (where failure of individual agents leads to role 
reassignment within in the team) are key problems in teamwork that appear suitable for 
COM-MTDP analysis. Such analysis may require extensions to the COM-MTDP frame- 
work (e.g., explicit modeling of roles). Ongoing research (Nair, Tambe, &: Marsella, 2002a) 
has begun investigating the impact of such extensions and their applications in domains 
such as RoboCup Rescue (Kitano, Tadokoro, Noda, Matsubara, Takahashi, Shinjoh, & Shi- 
mada, 1999). Analysis of more complex team behaviors may require further extensions 
to the COM-MTDP model to explicitly account for additional aspects of teamwork (e.g., 
notions of authority structure within teams). 

Third, extending COM-MTDP analysis beyond teamwork to model other types of co- 
ordination may require relaxation of COM-MTDP's assumption of selfless agents receiving 
the same joint reward. More complex organizations may require modeling other non-joint 
rewards. Indeed, enriching the COM-MTDP model in this manner may enable analy- 
sis of some of the seminal work in multiagent coordination in the tradition of PGP and 
GPGP (Decker Sz Lesser, 1995; Durfee Sz Lesser, 1991). Such enriched models may first 
require new advances in the mathematical foundations of our COM-MTDP framework, and 
ultimately contribute towards the emerging sciences of agents and multiagent systems. 
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